Search CORE

484 research outputs found

Model Selection in Overlapping Stochastic Block Models

Author: Ambroise C.
Birmelé E.
Latouche P.
Publication venue
Publication date: 01/01/2014
Field of study

Networks are a commonly used mathematical model to describe the rich set of interactions between objects of interest. Many clustering methods have been developed in order to partition such structures, among which several rely on underlying probabilistic models, typically mixture models. The relevant hidden structure may however show overlapping groups in several applications. The Overlapping Stochastic Block Model (2011) has been developed to take this phenomenon into account. Nevertheless, the problem of the choice of the number of classes in the inference step is still open. To tackle this issue, we consider the proposed model in a Bayesian framework and develop a new criterion based on a non asymptotic approximation of the marginal log-likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm, and demonstrate its efficiency by running it on both simulated and real data.Comment: articl

arXiv.org e-Print Archive

La formation des hydrologues

Author: Ambroise B.
Desbordes M.
Dupouyet J.P.
Jaccon Gilbert
Pieyns Serge
Truchot C.
Publication venue: ORSTOM
Publication date: 01/01/1993
Field of study

Horizon / Pleins textes

Determining appropriate approaches for using data in feature selection

Author: A Kalousis
C Ambroise
DW Aha
F Wilcoxon
G Chandrashekar
H Liu
J Reunanen
JC Platt
JR Quinlan
L Yu
M Lecocke
MA Hall
P Somol
V Bolón-Canedo
Y Han
Y Saeys
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2015
Field of study

Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

Crossref

Springer - Publisher Connector

University of East Anglia digital repository

Важлива складова національної безпеки (Проблеми захисту науково-технічної інформації)

Author: Ambroise Thomas P. (dir.)
Billot E.
Chaussade C.
Dehecq J.S.
Delatte Hélène
Domerg C.
Fohr G.
Fontenille Didier
Gaüzère B.A.
Thiria J.
Publication venue: Видавничий дім "Академперіодика" НАН України
Publication date: 01/01/2002
Field of study

У статті порушується проблема забезпечення захисту інформаційних ресурсів у науково- технічній сфері. Обґрунтовується значення науково-технологічного потенціалу для економічного і соціального розвитку України. Доводиться необхідність ґрунтовної розробки відповідної нормативно-правової бази.The article is dedicated to the problem of ensuring of protection of information resources in scientific-technical sphere, significance of the scientific-technological potential for economical and social growth of Ukraine is grounded. Necessity of well-founded development of correspondent normative and legal base is proved

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Infection with Toxoplasma gondii does not Alter TNFα and IL-6 Secretion by A human Astrocytoma Cell Line

Author: Ambroise-Thomas P.
Meunier A.
Nissou M.-F.
Pelloux H.
Renversez J.-C.
Ricard J.
Vuillez J.-P.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/1994
Field of study

The secretion of tumour necrosis factor-α (TNFα), interleukin-1α (IL-α) and interleukin-6 (IL-6) by a human astrocytoma cell fine was studied 1 h, 3 h, 6 h and 24 h after infection with tachyzoites from three Toxoplasma gondii strains (virulent, RH; cystogentc, 76K and Prugniaud strains). The astrocytoma cell fine constitutively secreted TNFα and IL-6, but no IL-1α. A positive control was obtained by stimulation with phorbol esters inducing a significant increase (p < 0.05) in TNFα and IL- 6 secretion but not in IL-1α, while lipopolysaccharide (alone and after priming), interferon gamma, ionophore A 23187 and sera positive to T. gondii did not induce any increase in cytokine levels. None of the tachyzoites, whatever their virulence, induced a significant increase in cytokine production at any time in the study. Tachyzoites did not inhibit the secretion induced by phorbol esters

Crossref

Directory of Open Access Journals

PubMed Central

The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

Author: A Ivshina
Anne-Claire Haury
C Ambroise
C Fan
C Lai
C Sotiriou
C Sotiriou
F Reyal
G Abraham
H Zou
I Guyon
I Guyon
J Bi
J Mairal
J Wang
Jean-Philippe Vert
JPA Ioannidis
L Ein-Dor
L Ein-Dor
M Dai
Muy-Teck Teh
N Meinshausen
P Wirapati
Pierre Gestraud
R Kohavi
R Shen
R Simon
R Tibshirani
RA Irizarry
S Michiels
T Abeel
T Barrett
T Iwamoto
W Shi
Y Benjamini
Y Pawitan
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/06/2011
Field of study

Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter methods generally outperform more complex embedded or wrapper methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

HAL-MINES ParisTech

Identification of disease-causing genes using microarray data mining and gene ontology

Author: A Mohammadi
A Zhang
AA Alizadeh
Azadeh Mohammadi
B Duval
BF Souza
C Ambroise
C Ding
C Tago
D Lin
D Singh
E Martinez
FM Couto
I Guyon
I Inza
J Jaeger
JJ Jiang
L Li
L Yu
L Ziaei
Mansoor Salehi
Mohammad H Saraee
N Cristianini
P Pavlidis
P Resnik
PA Mundra
PA Mundra
PJ Park
R Genuer
RF Weaver
S Li
S Li
TM Huang
TR Golub
TS Furey
U Alon
W Xu
Y Ding
Y Saeys
Y Wang
YL Chin
Z Xie
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

University of Salford Institutional Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Complete structure of the chemosensory array core signalling unit in an E. coli 1 minicell strain

Author: Ames Peter
Bacia-Verloop Maria
Baulard Megghane
Burt Alister
Cassidy C. Keith
Desfosses Ambroise
Gutsche Irina
Huard Karine
Luthey-Schulten Zaida
Margolin William
Parkinson John S.
Stansfeld Phillip J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Motile bacteria sense chemical gradients with transmembrane receptors organised in supramolecular signalling arrays. Understanding stimulus detection and transmission at the molecular level requires precise structural characterisation of the array building block known as a core signalling unit. Here we introduce an Escherichia coli strain that forms small minicells possessing extended and highly ordered chemosensory arrays. We use cryo-electron tomography and subtomogram averaging to provide a three-dimensional map of a complete core signalling unit, with visible densities corresponding to the HAMP and periplasmic domains. This map, combined with previously determined high resolution structures and molecular dynamics simulations, yields a molecular model of the transmembrane core signalling unit and enables spatial localisation of its individual domains. Our work thus offers a solid structural basis for the interpretation of a wide range of existing data and the design of further experiments to elucidate signalling mechanisms within the core signalling unit and larger array

Hal - Université Grenoble Alpes

Warwick Research Archives Portal Repository

HAL-CEA

The Dexi-SH* model for a multivariate assessment of agro-ecological sustainability of dairy grazing systems

Author: Ambroise R.
Astigarraga L.
Bockstaller C.
Colloque Dinabio
Coquil X.
Fiorelli J.-L.
Gerber M.
Hostiou N.
Ingrand S.
Marie M.
Peigné J.
Plantureux S.
Sadok Walid
Veysset P.
Publication venue: United Nations University
Publication date: 01/01/2009
Field of study

Dexi-SH* is an ex ante multivariate model for assessing the sustainability of dairy cows grazing systems. This model is composed of three sub-models that evaluate the impact of the systems on: (i) biotic resources; (ii) abiotic resources, and (iii) pollution risks. The structuring of the hierarchical tree was inspired by that of the Masc model. The choice of criteria and their aggregation modalities were discussed within a multi-disciplinary group of scientists. For each cluster, a utility function was established in order to determine weighting and priority functions between criteria. The model can take local and regional conditions and standards into account by adjusting criterion categories to the agroecological context, and the specific views of the decision makers by changing the weighting of criteria

Organic Eprints

DIAL UCLouvain

Optimality Driven Nearest Centroid Classification from Genomic Data

Author: A Alizadeh
Alan R. Dabney
AR Dabney
AR Dabney
B Efron
C Ambroise
C Stein
D Ross
I Hedenfalk
J Khan
J Schäfer
Ji Zhu
John D. Storey
JW Lee
K Mardia
P Bickel
R Shen
R Tibshirani
RJ McKay
RJ McKay
S Dudoit
T Golub
TH Bø
Y Guo
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

Texas A&M Repository